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Abstract: We offer simple solutions to three kinematic problems that occur in the folding of 
proteins. We show how to construct suitably local elementary Monte Carlo moves, how to close a 
loop, and how to fold a loop without breaking the bond that closes it. 
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Three Kinematic Problems 

Monte Carlo searches for the low-energy states of 
proteins pose three related kinematic problems: 

1. How does one design suitably local elementary 
Monte Carlo moves? 

2. How does one configure a main-chain loop between 
two fixed points? 

3. How does one fold a loop without breaking any of 
its bonds? 

The choice of elementary localized moves may be almost 
as important as the choice of the energy function. The 
loop problems occur if one adds or deletes residues in 
a backbone strand in order to model a homologous or 
mutant protein from a known x-ray structure. They 
also occur if one has a loop that is unresolved by x-ray 
crystallography or a primary sequence with two cysteines 
that might form a disulfide bond. 



Local Moves 

The positions ?i of the atoms of a protein arc lo- 
cal coordinates, but they are subject to constraints. The 
dihedral angles, fa and 4>i, describe the state of a protein 
more efficiently, but they are not local coordinates; a 
change in a dihedral angle near the center of a protein 
rotates half of the molecule, moving distant atoms 
farther than nearby ones. Such thrashing violates the 
conservation of angular momentum and of energy, and 
engenders steric clashes. Real proteins do not thrash; 
they wriggle. So if one uses the dihedral angles as 
coordinates, then one must craft elementary moves that 
are suitably local. 

How does one combine rotations about dihedral bonds 
so that the net motion is suitably local? This problem 
was addressed by Go and Scheraga [| and has since been 
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rotations are complicated. They are 3x3 orthogonal 
matrices with elements that are sines and cosines of the 
relevant angles. The nonlincarity of these trigonometric 
functions has held back progress on this problem. 

Yet every smooth function becomes linear when exam- 
ined at a small-enough scale. Rotations of infinitcsimally 
small angles are linear functions of those angles. Linear 
algebra is relatively simple. 

The change dr in the position f of an atom due to a 
rotation by a small angle e about a bond axis represented 
by the unit vector b is the cross-product of e b with the 
vector to the point r from any point c on the axis 



dr = el ix (r — c). 



(1) 



So the change dr due to n rotations by the small angles 
6i about the bonds bi is the sum 

n 

dr = ei U x (r - c\) 

n 

= y^ejbj X (f-a + a-Cj) (2) 

i=l 

(n \ n 

^2 e * ^ J x (r- a) + ^2 e f k x (a - Ci) 
i=i / i=i 

which is a rotation and a translation. The point a is 
entirely arbitrary; a convenient choice is the average po- 
sition a = (1/n) Ci of the points 

The rotation is less local than the translation because 
its effect is proportional to the length of the vector (r—a). 
But it is easy to make the net rotation 



^EiSj x (f-a) 



(3) 



,i=l 



vanish. A set of n vectors bi is said to be linearly depen- 
dent if there arc coefficients e.-, such that 



5>A = o. 



(4) 



i=l 
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Any set of n vectors bi in a space of d dimensions is lin- 
early dependent if n > d. Since the bond vectors bi are 
three dimensional, any four or more are linearly depen- 
dent. So if we use at least 4 bond vectors bi, then we 
always may find angles £j that make the sum |Q} vanish. 

We may find these angles by performing a singular- 
value decomposition. Every nxm real matrix B may be 
written as the product of an n x n orthogonal matrix U, 
an n x m matrix E, and an m x m orthogonal matrix V 
in the singular-value decomposition 



B = U E V. 



(5) 



The matrix E is zero except for its diagonal matrix ele- 
ments, which arc the non-negative singular values of the 
matrix B. 

To find small angles e,-, such that 



i=l 



(6) 



we set n = 4 and form a 3 x 4 matrix B whose columns are 
the 4 bond vectors bi. Its singular- value decomposition 
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Because the matrices U and V are orthogonal, their rows 
and columns are orthonormal vectors. In particular 



E V ^ V ^ 



(8) 



fe=i 



So if we take the small angles to be e t = x Va , where x 
is a scale factor, then the orthonormality (JSJ) of the rows 
of V will imply 
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Lapack is stable tested software that solves many 
problems of linear algebra. Its subroutine dgesvd per- 
forms singular-value decompositions in double-precision 
arithmetic. The call to 

dgesvd('N', 'A', 3, 4, B, 3, S, U, 1, V, 4, WORK, 402, INF) 
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FIG. 1: For the protein phosphoglycerate kinase, 16pk, the 
lines trace the mean values of the rmsd for 10 runs guided by 
algorithms that respectively use thrashing, fourfold thrashing, 
wriggling, sevenfold thrashing, sevenfold wriggling, and the 
sevenfold way. 



returns the matrix V. The small angles e$ may be taken 
to be 



(10) 



in which x is a random number in the range —5 < x < 8, 
and 8 is small enough for the small-angle approximations 
(HH2J) to be valid. We used 5 = 0.0125. WORK is a 
double-precision array of dimension LWORK, here taken 
to be 402. If the call is successful, then INF is returned 
as zero. 

We use the word wriggling to denote this way of can- 
celing the non-local effects of rotations. We tested our 
wriggling algorithm in Monte Carlo simulations of pro- 
tein folding against an algorithm in which successive di- 
hedral angles were varied independently, thrashing, and 
also against one in which the dihedral angles were varied 
in groups of four, fourfold thrashing. To separate kine- 
matic from dynamic issues, we used as a nearly perfect 
but highly artificial energy function, the rmsd between 
the main-chain atoms of our wriggling protein and those 
of its pdb file. 

Because of our use of the rmsd as an energy function, 
the proteins of our simulations are phantoms; they can 
pass through themselves. A real but approximate energy 
function would reject all moves into excluded volume; it 
therefore would reject many thrashing moves because of 
their large-scale motions. The use of the rmsd in our 
tests deprives wriggling of one of its key advantages over 
thrashing, namely that its localized motions are less likely 
to involve steric clashes. Thus the utility of wriggling in 
simulations with real energy functions may be greater 
than is indicated by our tests. 
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In our tests of various move sets, we let each algorithm 
fold 10 highly denatured coils of phosphoglycerate kinase 
(16PK.pdb, 415 aa), five of which were stretched. The al- 
gorithm made a list of the rotatable bonds which we took 
to be all the main-chain N-C a and C Q -C bonds, except 
for the N-C Q bonds of the prolines. In 16PK.pdb, there 
were Nb = 810 rotatable bonds. The thrashing code 
varied all Nb main-chain dihedral angles in each Monte 
Carlo sweep, but the wriggling and fourfold-thrashing 
codes, which apply their moves to groups of four bonds 
at a time, started with bonds 1-4 and ended with bonds 
(N B - 3)-N B . After 100,000 Monte Carlo sweeps, the 
wriggling, thrashing, and fourfold-thrashing algorithms 
respectively reduced their mean rmsd's to 1.63 ± 0.02, 
2.40±0.02, and 1.66±0.03 A. In Fig. □ the mean rmsd's 
are plotted against the number of sweeps. These results 
are similar to those we obtained earlier [l^ using C Q 
rmsd's. 

The use of n — 4 bonds 6, is the simplest way of can- 
celing the highly non-local effects of rotations, but it is 
not the best way. In further tests we found much lower 
rmsd's by using n = 7 bonds bi. We call this seven- 
fold wriggling. The matrix B is now 3x7, each of its 7 
columns being a bond vector bi. The call is to 

dgesvd('N', 'A', 3, 7, B, 3, S, U, 1, V, 7, WORK, 460, INF). 

The angles arc given by e, = x Vn where \x\ < S is a ran- 
dom number, and 6 is small enough that the small-angle 
approximations (PEJ) are valid. We used 5 = 0.0125. 

Sevenfold wriggling dropped the mean rmsd for 16pk 
to 0.62±0.02 A, as shown in Fig. 2] We also experimented 
with using more than seven bonds: n = 8 gave 0.62 A; 
n = 9 gave 0.70 A; n = 10 gave 0.76 A; and n = 20 
gave 1.30 A. Sevenfold thrashing gave a mean rmsd of 
1.53 ±0.03 A. 

By using seven or more bonds, we may cancel not only 
the net rotation 

n 

j2ak = o (ii) 

i=i 

but also the net translation 

n 

^e i b l x{a-c t ) = 0. (12) 
i=l 

To do this, we write these two conditions in terms of the 
six-vectors 

8 i =( i }i ^ (13) 
\b t x (a - a) J 

as 

n 

^e iSi = 0. (14) 

i=l 

Because any 7 or more 6-vectors Sj arc linearly depen- 
dent, such small angles e; always exist if at least 7 bonds 
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FIG. 2: For the protein myoglobin, 1a6m, the lines trace the 
mean values of the rmsd for 10 runs starting from a long a- 
helix and from a long /3-strand guided by sevenfold thrashing 
or the sevenfold way. 

bi are used. We call such moves strictly local wriggling. 
The matrix B now is 6 x 7, each of its 7 columns being 
a 6-vector Si, and the call is to 

dgesvd('N', 'A', 6, 7, B, 6, S, U, 1, V, 7, WORK, 850, INF). 

One might think (as we did) that strictly local wrig- 
gling is the ideal way to fold a protein. But although it is 
well suited to our third kinematic problem, the folding of 
a closed loop, it is very slow because it leaves the first and 
last backbone atoms unmoved to first order in the small 
angles e$. What does work well is the use of sevenfold 
wriggling and strictly local wriggling on alternate Monte 
Carlo sweeps along the protein. We call this technique 
the sevenfold way. It reduced the mean rmsd of 16pk to 
0.38 ± 0.02 A, as shown in Fig.HJ 

We also compared the sevenfold way to sevenfold 
thrashing on two other globular proteins — sperm-whale 
myoglobin (lA6M.pdb, 151 aa) and human muscle fatty- 
acid binding protein (lHMR.pdb, 131 aa). On these 
shorter proteins, we used the sevenfold way on all sets of 
seven contiguous bonds, fourfold wriggling on the three 
final sets of four contiguous bonds, and allowed arbitrary 
rotations about the last three bonds. For each protein, 
we made two denatured starting configurations, one that 
was a single long straight /3-strand and one that was a 
single long straight a-helix. In 10 runs of 100,000 sweeps, 
sevenfold thrashing reduced the mean main-chain rmsd's 
of the long a-helix and of the long /3-strand of 1a6m to 
1.55 ±0.01 and 0.89 ±0.03 A respectively. The sevenfold- 
way did much better, respectively reducing the rmsd's to 
0.12 ± 0.02 and 0.19 ± 0.05 A, as shown in Fig. El In the 
case of lHMR, the mean rmsd's of the long a-helix and 
of the long /3-strand respectively were reduced in 10 runs 
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FIG. 3: For the human muscle fatty-acid binding protein, 
Ihmr, the lines trace the mean values of the rmsd for 10 runs 
starting from a long a-helix and from a long /3-strand guided 
by sevenfold thrashing or the sevenfold way. 
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FIG. 4: For the protein 16pk, the lines trace the mean values 
of the rmsd for 10 runs guided by algorithms that respec- 
tively use thrashing and the sevenfold way with linear and 
exponential cooling. 



of 100,000 sweeps to 0.63 ± 0.02 and 0.63 ± 0.01 A by 
sevenfold thrashing and to 0.44 ± 0.02 and 0.43 ± 0.03 A 
by the sevenfold way, as shown in Fig. [3J 

Because our artificial energy function, the rmsd, is not 
directly related to an energy, our fixed-temperature sim- 
ulations were carried out at zero temperature. We did 
however test the sevenfold way against thrashing in runs 
with simulated annealing. In these Monte Carlo simula- 
tions, the temperature dropped cither linearly or expo- 
nentially with the sweep number from a very high T at 
the start of the simulation to T = at 80,000 sweeps. 
These runs finished with 20,000 sweeps at T = 0. As 
shown in Fig. 0] the sevenfold way reduced the mean 
rmsd of 16PK to 0.42 ± 0.01 A with exponential cooling 
and to 0.40 ± 0.02 A with linear cooling. Thrashing re- 
duced the mean rmsd to 2.01 ± 0.03 A with exponential 
cooling and to 1.58 ± 0.15 A with linear cooling. We did 
not try entropic sampling [20|. 

In these algorithms, the changes in the dihedral angles 
are small so that the small-angle approximations 
are valid; atoms are displaced in each move by no more 
than about 0.05 A. Such small moves might seem to 
imply a slow rate of convergence to minimum-energy 
states. But globular proteins typically are 50 A or less in 
diameter, and so in 1,000 moves an atom could traverse 
the protein. To test whether larger moves would give 
faster convergence, we made 10 thrashing runs with a 
cutoff 40 times bigger than our usual 5 = 0.0125. These 
runs with large-angle thrashing reduced the mean main- 
chain rmsd in 100,000 sweeps to 3.69 ± 0.06 A, which is 
to be compared to 2.40 ± 0.02 A with 5 = 0.0125. So 
bigger moves may not mean faster convergence, perhaps 
because in a partially folded protein, smaller moves are 



more likely than larger ones to lower the energy. 
To Close a Loop 

When modeling a homologous or mutant protein 
or an unresolved loop, one must configure a main chain 
between two fixed points. This problem also arises if one 
has a main chain with two cysteine residues, and one 
needs to make a disulfide bridge between them, forming 
a loop. 

Let us consider the case of a loop with a disulfide 
bridge. Provided the strand of backbone is long enough, 
we may change the dihedral angles of the residues of the 
strand between the cysteines so as to move the /3-carbon 
and the 7-sulfur of the second cysteine into the required 
positions opposite those of the first cysteine, which is 
held fixed. Let Cpo and 5* 7 o be the points to which the 
/3-carbon and the 7-sulfur should be moved, and let Cp 
and S-y be their present locations. 

We have seen in Eq.(j5J) that several small rotations 
amount to a net rotation and a net translation. We may 
choose the small angles of the rotations so as to correctly 
orient the Cp — Sy bond and to move it to the right po- 
sition. 

The required translation is 

t = - (Syo + Cpo - 5 7 - C^j . (15) 
The axis of the required rotation is 

x = (Sy - C^j x [Sy - C (M ) , (16) 
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and the tangent of its angle 9 is the ratio of the length ||x|| 
of this cross-product to the corresponding scalar-product 

d = (s 7 - C ) ■ (S 7 o - C fM ) , (17) 

that is, tan 9 = \\x\\/d. So the required angle of rotation 
is 

8 = atan2(||f||,rf) x. (18) 

Thus we must perform n > 6 small rotations, each of 
angle about bond hi, so that the net rotation is 

n 

i=l 

and the net translation is 

n 

t = J2 e i~ b i>< (a -Si). (20) 

i=l 

These two conditions may be written in terms of the 6- 
vectors 

Si=(t h (21) 
\biX (a - a) ) 

as 

i=l ^ ' 

We form a matrix B whose first n columns are the 
6-vectors Si and whose last column is the 6-vector 
( — 9, — t)- A call to dgesvd returns the matrix V, and 
the small angles are given by a suitably safe version 
of ti = V(j, i)/V(j, n + 1), where j > 6 is the row index 
with the largest value of \V(j, n+l)|. In our applica- 
tions of this transport algorithm, we set n equal to the 
number of rotatable main-chain bonds of the loop. 

The extracellular domain of human tissue factor 
(lBOY.pdb, 219 aa) has two disulfide bonds. The one 
between residues 186 and 209 closes a loop that has 44 
rotatable main-chain bonds. The transport algorithm 
with n = 44 closed this loop to less than O.OOOlA in 14 
sweeps, as shown in Fig. [S] 

To Fold a Loop 

When the transport algorithm closes a loop, the 
loop may well be of quite high energy with steric con- 
flicts. It is therefore necessary to vary the conformation 
of the loop without breaking it. Strictly local wriggling 
moves are well suited to this task. But we have found 
that the sevenfold way combined with the transport 
algorithm does a better job. We used the sevenfold 
way on all sets of seven contiguous bonds and applied 
the transport algorithm after every 200,000 sweeps of 
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FIG. 5: The transport algorithm closed the 44-bond loop of 
the protein Iboy in 14 sweeps. 
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FIG. 6: The sevenfold way needed about a million sweeps to 
reduce the main-chain and all-atom rmsd's of the loop of 44 
rotatable bonds in Iboy below 1 A. The spikes are caused by 
the application of the transport algorithm, which closes the 
loop after every 200,000 sweeps. 



the sevenfold way. As an energy function, we used 
the all-atom rmsd between the atoms of the folding 
loop and those of the pdb file. In 10 runs of 2,000,000 
sweeps from fully extended coils of the 44-bond loop 
of Iboy, this process reduced the loop's mean all-atom 
rmsd to 0.85 ± 0.14 A and its mean main-chain rmsd to 
0.42 ± 0.06 A, as shown in Fig.© 

The sevenfold way needed a million sweeps to reduce 
the mean all-atom rmsd of the 44-bond loop in Iboy to 
less than 1 A. One may do better by letting the pro- 
gram randomly choose how many bonds, 7 < n < 44, to 
fold and which row, 7 < j < n, of the matrix V of the 
singular-value decomposition JSJ) to use for the angles 
e, = xVji. A further advantage is gained by allowing 
suitable moves of the last six bonds, as was done for the 
proteins 1a6m and Ihmr. In 10 runs of 20,000 sweeps, 
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FIG. 7: The n-fold way reduced to zero the all-atom rmsd of 
the loop of 44 rotatable bonds in Iboy in all 20 runs of 25,000 
Monte Carlo sweeps. 
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FIG. 8: The n-fold way reduced to zero the all-atom rmsd 
of the loop of 60 rotatable bonds in human progastricsin, 
lHTR.pdb, in 7 of 10 runs of 30,000 Monte Carlo sweeps. 

this algorithm reduced the mean all-atom rmsd of the 44- 
bond loop of Iboy to 0.66 ± 0.22 A. The large standard 
error arose from the fact the rmsd got stuck at 1.326 A 
in 5 of the runs, but went to less than 0.02 A in the other 
5. 

Hoping to send the rmsd of every run to zero, we in- 
troduced a new Monte Carlo move. When a protein has 
folded to a globular form close to the native structure, a 
net translation may disrupt the folding protein as much 
as or more than a net rotation. So we added a new move 
in which the net translation, but not the net rotation, is 
set to zero: 

n 

t = e< k x (a - Ci) = 0. (23) 



One may enforce this condition by forming a matrix B 
whose n > 4 columns are the vectors hi x (a — Cj). The 
resulting n-fold way on successive sweeps sets to zero net 
translations, net rotations, and both the net translations 
and the net rotations. It reduced the rmsd of the 44-bond 
loop of Iboy to zero in 20 out of 20 runs in fewer than 
25,000 sweeps, as shown in Fig. 

The aspartyl protease human progastricsin (lHTR.pdb, 
329 aa) has three disulfide bonds. The one between 
residues 251 and 284 closes a loop that has 60 rotatable 
bonds. We denatured and stretched that loop and then 
used the n-fold way to fold it. By limiting the number n 
of simultaneously rotated bonds to 44, which by default 
was true in the case of Iboy, we found that we could 
drive the rmsd to zero in 7 out of 10 runs, as shown in 
Fig- El hi simulations with n set equal to 20, 30, 40, and 
50, we found that the rmsd went to zero in 4 or 5 out of 
10 runs. 



Summary 

We have presented simple solutions to three kine- 
matic problems that occur in the folding of proteins. We 
have shown how to construct suitably local elementary 
Monte Carlo moves, how to close a loop, and how to fold 
a loop without breaking any of its bonds. 

In future work, we intend to determine whether 
these kinematic algorithms improve the efficiency of 
finite-temperature Monte Carlo searches guided by 
realistic energy functions with implicit solvation and 
excluded volume. 
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